Proposal: Cluster Scoped Resources #1400

nikhildl12 · 2017-11-14T23:18:20Z

Initial proposal for cluster scoped resources

Related: kubernetes/kubernetes#19080 and https://groups.google.com/forum/#!topic/kubernetes-users/eUUrdlBwa7g

k8s-ci-robot · 2017-11-14T23:18:22Z

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://github.com/kubernetes/kubernetes/wiki/CLA-FAQ to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
If you have done the above and are still having issues with the CLA being reported as unsigned, please email the CNCF helpdesk: [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

timothysc · 2017-11-22T15:22:22Z

contributors/design-proposals/scheduling/cluster-scoped-resources.md

@@ -0,0 +1,76 @@
+# Cluster Scoped Resources


please follow the KEP process outlined by @kubernetes/sig-architecture-feature-requests

Is KEP now a requirement or a recommendation? That was not clear from the contributor summit discussions.

/cc @jdumars

@vishh @timothysc: is this the template that needs to be followed: https://github.com/kubernetes/community/blob/master/keps/0000-kep-template.md

timothysc · 2017-11-22T15:24:49Z

contributors/design-proposals/scheduling/cluster-scoped-resources.md

+Cluster scoped resources are consumable resources that do not belong to any specific node but instead are available across mulitple nodes in a cluster. These resources are accounted as other consumable resources and should be usable by the scheduler while deciding if a pod can actually be scheduled.
+
+
+## Motivation


Software licenses are the most common reason for such features in other systems.

zhouhaibing089 · 2017-11-23T01:18:50Z

contributors/design-proposals/scheduling/cluster-scoped-resources.md

+
+
+## Motivation
+Resources in Kubernetes such as cpu and memory are available at a node level and can be consumed by pods by requesting them. However there are some resources that do not belong a specific node, but they are consumable across all or a group of nodes in the cluster. As an example, IP addresses in a pool can be shared across pods running on multiple nodes in a network scope. Another use case could be, locally attached shared storage in a rack, which is consumable across several nodes. Hence there is a need to represent such a resource at cluster level which is consumable acroass all or a group of nodes in the cluster.


This is more like a node group scoped resources in the examples.

Please add a list of 5-8 example resources that would be tracked like this. I’d like more validation and concrete discussion on each type to guide design.

+1. There are many use cases for extending resource APIs and I'd like to first get a collection of use-cases before identifying possible solutions.

Added few use cases

davidopp · 2017-11-23T01:28:27Z

cc/ @kubernetes/sig-scheduling-feature-requests
cc/ @vishh

davidopp · 2017-11-24T22:10:09Z

Thanks for writing this, it's definitely a feature we have been talking about for a while.

I think a complete solution to this problem should consider how the resource allocator for the cluster-level resource fits in. I think that cluster-scoped resources are likely to have some kind of external allocator, for example the agent that hands out IP addresses or software licenses. It's important for the scheduler's view of free resources to stay in sync with that of the external allocator, which has the authoritative information, so that we can minimize the likelihood that a container starts up and finds that the resource is not actually available.

For example, with a normal resource the scheduler assumes the resources become freed when the pod terminates or is deleted. But with cluster-level resources, if we leave the allocation and deallocation of the resource up to the container, it might be possible to leak resources (container forgets to release the IP address or license, or gets killed before it can, so the resource is still allocated but the scheduler thinks it is free because the pod has terminated). So maybe the scheduler should be responsible for reserving the resource from the allocator before binding the pod, and unreserving the resource via the allocator when the pod terminates. It's probably quite complicated to ensure that a container only tries to allocate resources that have been reserved for it, so it's probably not a "secure" solution but might be good enough.

One approach is what we did for PDB and ResourceQuota, where decrementing the amount free is synchronous with requesting it (in the cluster-scoped resource case, this would mean the scheduler decrements the free) but replenishing the resource when it is no longer in use is asynchronous and done by a separate controller (could be the agent that is responsible for the cluster-level resource, when a container deallocates the resource).

smarterclayton · 2017-12-11T01:09:12Z

contributors/design-proposals/scheduling/cluster-scoped-resources.md

+}
+
+// ClusterResource represents a resource which is available at a cluster level
+type ClusterResource struct {


If this is a form of quota, it should be named as such - ClusterResourceNodeQuota. It’s not actually clear how this api aligns with ResourceQuota, please comment to that effect.

ClusterResource is an api type that represents a cluster scoped resource. However it's integration with resourcequotas needs to be added, probably at later a phase such as beta?

vikaschoudhary16 · 2017-12-11T12:35:55Z

contributors/design-proposals/scheduling/cluster-scoped-resources.md

+// pkg/api/types.go:
+
+// ClusterResourceQuantity represents quantity of a ClusterResource
+type ClusterResourceQuantity struct {


How is the discovery/initialization flow?

Cluster admin or other controllers will post the ClusterResource objects that captures the capacity and allocatable quantities of aClusterResource, which will then be used by scheduler

vikaschoudhary16 · 2017-12-11T12:38:05Z

contributors/design-proposals/scheduling/cluster-scoped-resources.md

+}
+```
+
+`clusterinfo` is added to scheduler cache to do accounting for ClusterResources consumed by pods. `clusterInfo` will be exposed to the predicate and priority functions in order to take ClusterResources into consideration while making scheduling decisions.


How clusterinfo will be build?

clusterinfo can be build similar to how we build nodeInfo since scheduler will be watching for ClusterResources

vishh · 2017-12-11T22:43:40Z

contributors/design-proposals/scheduling/cluster-scoped-resources.md

+
+ClusterResources are consumable by pods just like CPU and memory, by specifying it in the pod request. The scheduler should take care of the resource accounting for ClusterResources so that no more than the available amount is simultaneously allocated to Pods. The prefix used to identify a ClusterResource coule be 
+```
+pod.alpha.kubernetes.io/cluster-resource-


I'm not a fan of special prefixes. I'd like to see if we can avoid overloading resource names.

+1, we just moved away from this pattern with extended resources.

It can follow fully-qualified resource names similar to extended resources, but we need to see how will those be differentiated

k8s-ci-robot · 2017-12-22T10:11:00Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: nikhildl12
We suggest the following additional approver: davidopp

Assign the PR to them by writing /assign @davidopp in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

contributors/design-proposals/scheduling/OWNERS

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

nikhildl12 · 2018-01-20T01:27:23Z

@davidopp @timothysc @vishh: I would like to understand what could the next steps be for this proposal. As a first action item, I can submit this in the form of a KEP: https://github.com/kubernetes/community/blob/master/keps/0000-kep-template.md

cblecker · 2018-01-29T17:31:11Z

@nikhildl12 One important step is to sort out your CLA, as outlined here: #1400 (comment)

/ok-to-test

bsalamat · 2018-01-30T00:54:39Z

contributors/design-proposals/scheduling/cluster-scoped-resources.md

+pod.alpha.kubernetes.io/cluster-resource-
+```
+
+### Accounting in scheduler


We have extended resources and several types of first class resources in the scheduler already. I think it would be possible to come up with a single presentation that covers all of these types. For example, I don't see much of a difference between a cluster resource and extended resource from scheduler's point of view. An extended resource with an additional "type" can represent a cluster resource.

The key difference between these two resources is the "scope".
Currently extended resources are exposed as a part of node status because of their nature of being tied to a node, while cluster scoped resources have to be represented outside the scope of a node. But we can surely have a comprehensive API that covers both. From the scheduler's point of view, it will need some additional logic to calculate and cache the available capacity of a cluster scoped resource across a set of nodes

Yes, the "ResourceClass" that @jiayingz is working on is an effort in that direction to provide a comprehensive API to represent various types of resources, including cluster resources.

Yes. As @bsalamat mentioned, we are working on a new Resource API proposal that aims to provide a comprehensive API for both node-level resources and cluster-level resources. Here is the current PR:
#782
It is still WIP and the current plan is to focus on node-level resources during the initial phase. But I think even the initial API should help solve some of the listed problems here. Please take a look and let us know if you see any missing pieces.

bsalamat · 2018-01-30T01:03:03Z

contributors/design-proposals/scheduling/cluster-scoped-resources.md

+
+### Accounting in scheduler
+
+ClusterResources should be tracked as normal consumable resources and should be considered by the scheduler when determining if a pod can actually be scheduled


Another important aspect of cluster resources which is not covered here is how to bind these resources to a chosen node during/after scheduling. A fairly complex logic is already added to scheduler to handle provisioning and binding PVs to nodes during scheduling. Similar processes may be needed for other resources, such as TPUs, etc. I think that aspect should be covered by the proposal.

@vishh @jiayingz

To cover that aspect, I prefer the approach mentioned by @davidopp in his previous comment. The external agent/controller which exposes the available capacity of this resource can be made responsible for binding or making sure that those resources are ready to use when a pod is going to run on a node. Similarly when a pod dies, that agent needs to deallocate/unbind the corresponding resource and increment the available quantity so that it can be used for scheduling of new pods

out of date

fejta-bot · 2018-07-12T22:55:12Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2018-08-11T23:41:22Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2018-09-11T00:28:37Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2018-09-11T00:28:44Z

@fejta-bot: Closing this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

krmayankk · 2018-09-11T07:15:57Z

why has this been abandoned ? The proposal seems fair ?

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Nov 14, 2017

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 14, 2017

k8s-github-robot assigned davidopp and timothysc Nov 14, 2017

k8s-github-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Nov 14, 2017

timothysc previously requested changes Nov 22, 2017

View reviewed changes

k8s-ci-robot added sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. kind/feature Categorizes issue or PR as related to a new feature. labels Nov 22, 2017

nikhildl12 force-pushed the cluster-resource-proposal branch from 6f9143b to 62204c3 Compare November 23, 2017 00:33

zhouhaibing089 reviewed Nov 23, 2017

View reviewed changes

smarterclayton suggested changes Dec 11, 2017

View reviewed changes

vikaschoudhary16 reviewed Dec 11, 2017

View reviewed changes

vishh reviewed Dec 11, 2017

View reviewed changes

k8s-ci-robot requested a review from jdumars December 11, 2017 22:49

cluster scoped resources

69a0ddd

nikhildl12 force-pushed the cluster-resource-proposal branch from 62204c3 to 69a0ddd Compare December 22, 2017 10:10

k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 29, 2018

davidopp assigned bsalamat Jan 29, 2018

bsalamat reviewed Jan 30, 2018

View reviewed changes

k8s-github-robot added the kind/design Categorizes issue or PR as related to design. label Feb 6, 2018

timothysc removed their assignment Apr 13, 2018

vishh mentioned this pull request Apr 15, 2018

[Umbrella issue] Improve compute resource APIs kubernetes/kubernetes#62598

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 12, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 11, 2018

k8s-ci-robot closed this Sep 11, 2018

		Cluster scoped resources are consumable resources that do not belong to any specific node but instead are available across mulitple nodes in a cluster. These resources are accounted as other consumable resources and should be usable by the scheduler while deciding if a pod can actually be scheduled.


		## Motivation



		## Motivation
		Resources in Kubernetes such as cpu and memory are available at a node level and can be consumed by pods by requesting them. However there are some resources that do not belong a specific node, but they are consumable across all or a group of nodes in the cluster. As an example, IP addresses in a pool can be shared across pods running on multiple nodes in a network scope. Another use case could be, locally attached shared storage in a rack, which is consumable across several nodes. Hence there is a need to represent such a resource at cluster level which is consumable acroass all or a group of nodes in the cluster.


		### Accounting in scheduler

		ClusterResources should be tracked as normal consumable resources and should be considered by the scheduler when determining if a pod can actually be scheduled

Proposal: Cluster Scoped Resources #1400

Proposal: Cluster Scoped Resources #1400

Conversation

nikhildl12 commented Nov 14, 2017

k8s-ci-robot commented Nov 14, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidopp commented Nov 23, 2017

davidopp commented Nov 24, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented Dec 22, 2017

nikhildl12 commented Jan 20, 2018

cblecker commented Jan 29, 2018

bsalamat Jan 30, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bsalamat Jan 30, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fejta-bot commented Jul 12, 2018

fejta-bot commented Aug 11, 2018

fejta-bot commented Sep 11, 2018

k8s-ci-robot commented Sep 11, 2018

krmayankk commented Sep 11, 2018

bsalamat Jan 30, 2018 •

edited

Loading

bsalamat Jan 30, 2018 •

edited

Loading